Sharp Detection in Pca under Correlations: All Eigenvalues Matter

نویسنده

  • Edgar Dobriban
چکیده

Principal component analysis (PCA) is a widely used method for dimension reduction. In high dimensional data, the “signal” eigenvalues corresponding to weak principal components (PCs) do not necessarily separate from the bulk of the “noise” eigenvalues. Therefore, popular tests based on the largest eigenvalue have little power to detect weak PCs. In the special case of the spiked model, certain tests asymptotically equivalent to linear spectral statistics (LSS)—averaging effects over all eigenvalues—were recently shown to achieve some power. We consider a nonparametric “local alternatives” generalization of the spiked model to the setting of Marchenko and Pastur (1967). This allows a general correlation structure even under the null hypothesis of no significant PCs. We develop new tests to detect weak PCs in this model. We show using the CLT for LSS that the optimal LSS satisfy a Fredholm integral equation of the first kind. We develop algorithms to solve it, building on our recent method for computing the limit empirical spectrum. Our analysis relies on the new concept of the weak derivative of the Marchenko-Pastur map of eigenvalues, which also leads to a new perspective on phase transitions in spiked models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

APPLICATION OF THE RANDOM MATRIX THEORY ON THE CROSS-CORRELATION OF STOCK ‎PRICES

The analysis of cross-correlations is extensively applied for understanding of interconnections in stock markets. Variety of methods are used in order to search stock cross-correlations including the Random Matrix Theory (RMT), the Principal Component Analysis (PCA) and the Hierachical ‎Structures.‎ In ‎this work‎, we analyze cross-crrelations between price fluctuations of 20 ‎company ‎stocks‎...

متن کامل

Correlation of Data Reconstruction Error and Shrinkages in Pair-wise Distances under Principal Component Analysis (PCA)

In this ‘on-going’ work, I explore certain theoretical and empirical implications of data transformations under the PCA. In particular, I state and prove three theorems about PCA, which I paraphrase as follows: 1). PCA without discarding eigenvector rows is injective, but looses this injectivity when eigenvector rows are discarded 2). PCA without discarding eigenvector rows preserves pair-wise ...

متن کامل

Finite Sample Approximation Results for Principal Component Analysis: a Matrix Perturbation Approach

Principal Component Analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the non-asymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, to those of the limiting population PCA as n → ∞. As in machine learning, we pr...

متن کامل

Finite Sample Approximation Results for Principal Component Analysis: a Matrix Perturbation Approach1 by Boaz Nadler

Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, and those of the limiting population PCA as n→∞. As in machine learning, we pres...

متن کامل

An Image Splicing Detection Method Based on PCA Minimum Eigenvalues

This paper presents a novel and effective image splicing forgery detection method based on the inconsistency of irrelevant components between the original and the tampered regions. The specific irrelevant components can be described by the minimum eigenvalues obtained by the principal component analysis (PCA) without knowing any prior information. To avoid the impact of local structures, a pixe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016